Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new example, wasm-gating, which demonstrates capability-based routing between baseline and subgroup WebGPU WASM builds. The addition includes a TypeScript implementation, HTML structure, and documentation. Feedback was provided to generalize the comments regarding logit_bias token IDs, as the current descriptions are specific to a different model version and could be misleading if the model or tokenizer is updated.
| const reply0 = await engine.chat.completions.create({ | ||
| messages: [{ role: "user", content: "List three US states." }], | ||
| // below configurations are all optional | ||
| n: 3, | ||
| temperature: 1.5, | ||
| max_tokens: 256, | ||
| // 46510 and 7188 are "California", and 8421 and 51325 are "Texas" in Llama-3.1-8B-Instruct | ||
| // So we would have a higher chance of seeing the latter two, but never the first in the answer | ||
| logit_bias: { | ||
| "46510": -100, | ||
| "7188": -100, | ||
| "8421": 5, | ||
| "51325": 5, | ||
| }, | ||
| logprobs: true, | ||
| top_logprobs: 2, | ||
| }); |
There was a problem hiding this comment.
The comments explaining the specific token IDs for "California" and "Texas" are highly model-dependent (Llama-3.1-8B-Instruct). This makes the example less portable and the comments could quickly become outdated or misleading if the model or tokenizer changes. Consider making these comments more generic about the purpose of logit_bias rather than detailing specific token values, or moving such model-specific details to external documentation if necessary.
// Example of using logit_bias to influence token generation.
// Specific token IDs and their corresponding words are model-dependent.
logit_bias: {
"46510": -100,
"7188": -100,
"8421": 5,
"51325": 5,
},| const modelRecord = webllm.prebuiltAppConfig.model_list.find( | ||
| (entry) => entry.model_id === selectedModel, | ||
| ); | ||
| const appConfig = |
There was a problem hiding this comment.
We also want to enforce subgroupMinSize <= 32 <= subgroupMaxSize and maxComputeInvocationsPerWorkgroup = 1024 for the subgroup wasm path
Summary
Adds a
examples/wasm-gatingexample showing how to route between baseline and subgroup WebGPU WASM libraries in WebLLM.adapter.features.has("subgroups")model_libselection based on WebGPU adapter support-subgroups.wasmwhen subgroup support is availableTesting
.wasmto-subgroups.wasmwhensubgroupsis reported by the adapter